Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Security: fix joining cluster with production license #31341

Merged
merged 2 commits into from
Jun 19, 2018

Conversation

jaymode
Copy link
Member

@jaymode jaymode commented Jun 14, 2018

The changes made to disable security for trial licenses unless security
is explicitly enabled caused issues when a 6.3 node attempts to join a
cluster that already has a production license installed. The new node
starts off with a trial licenses and xpack.security.enabled is not
set for the node, which causes the security code to skip attaching the
user to the request. The existing cluster has security enabled and the
lack of a user attached to the requests causes the request to be
rejected.

This commit changes the security code to check if the state has been
recovered yet when making the decision on whether or not to attach a
user. If the state has not yet been recovered, the code will attach
the user to the request in case security is enabled on the cluster
being joined.

Closes #31332

The changes made to disable security for trial licenses unless security
is explicitly enabled caused issues when a 6.3 node attempts to join a
cluster that already has a production license installed. The new node
starts off with a trial licenses and `xpack.security.enabled` is not
set for the node, which causes the security code to skip attaching the
user to the request. The existing cluster has security enabled and the
lack of a user attached to the requests causes the request to be
rejected.

This commit changes the security code to check if the state has been
recovered yet when making the decision on whether or not to attach a
user. If the state has not yet been recovered, the code will attach
the user to the request in case security is enabled on the cluster
being joined.

Closes elastic#31332
@jaymode jaymode added >bug v7.0.0 :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc) v6.4.0 v6.3.1 labels Jun 14, 2018
@jaymode jaymode requested review from tvernum and ywelsch June 14, 2018 18:49
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-security

@@ -254,7 +254,8 @@

public XPackLicenseState(Settings settings) {
this.isSecurityEnabled = XPackSettings.SECURITY_ENABLED.get(settings);
this.isSecurityExplicitlyEnabled = settings.hasValue(XPackSettings.SECURITY_ENABLED.getKey()) && isSecurityEnabled;
this.isSecurityExplicitlyEnabled = (isSecurityEnabled && settings.hasValue(XPackSettings.SECURITY_ENABLED.getKey())) ||
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was a suggestion from @bleskes that I went ahead and adopted as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the reasoning behind doing this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for cases where a user with a production license on 6.0 to 6.2.x upgrades to 6.3.1+. We require TLS for production mode, so if TLS is enabled we can take that as security being explicitly enabled.

Copy link
Contributor

@tvernum tvernum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question about the choice to always send auth if the state is not recovered (irrespective of the node settings)

@@ -98,7 +104,9 @@ public AsyncSender interceptSender(AsyncSender sender) {
@Override
public <T extends TransportResponse> void sendRequest(Transport.Connection connection, String action, TransportRequest request,
TransportRequestOptions options, TransportResponseHandler<T> handler) {
if (licenseState.isSecurityEnabled() && licenseState.isAuthAllowed()) {
final boolean stateNotRecovered = isStateNotRecovered;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purposes of this assignment (from volatile to local) initially confused me, and I'm worried someone might think it is redundant and remove it in the future. Can we add a comment to try and discourage that from happening?

@@ -98,7 +104,9 @@ public AsyncSender interceptSender(AsyncSender sender) {
@Override
public <T extends TransportResponse> void sendRequest(Transport.Connection connection, String action, TransportRequest request,
TransportRequestOptions options, TransportResponseHandler<T> handler) {
if (licenseState.isSecurityEnabled() && licenseState.isAuthAllowed()) {
final boolean stateNotRecovered = isStateNotRecovered;
final boolean sendWithAuth = (licenseState.isSecurityEnabled() && licenseState.isAuthAllowed()) || stateNotRecovered;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that if the state is not recovered, then we're going to sendWithAuth even if the local node explitily disables security.
If so, then why have the check at all? If it's safe to send the auth context it even if security is disabled, isn't it simpler to just send it for every outgoing request?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thought as @tvernum here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interceptor is not registered at all if security is disabled.

@Override
public List<TransportInterceptor> getTransportInterceptors(NamedWriteableRegistry namedWriteableRegistry, ThreadContext threadContext) {
if (transportClientMode || enabled == false) { // don't register anything if we are not enabled
// interceptors are not installed if we are running on the transport client
return Collections.emptyList();
}
return Collections.singletonList(new TransportInterceptor() {
@Override
public <T extends TransportRequest> TransportRequestHandler<T> interceptHandler(String action, String executor,
boolean forceExecution,
TransportRequestHandler<T> actualHandler) {
assert securityInterceptor.get() != null;
return securityInterceptor.get().interceptHandler(action, executor, forceExecution, actualHandler);
}
@Override
public AsyncSender interceptSender(AsyncSender sender) {
assert securityInterceptor.get() != null;
return securityInterceptor.get().interceptSender(sender);
}
});
}

This change limits to only sending auth when security could possibly be enabled. Once we've recovered the state then the check is based upon the license. By having the check we maintain the ability to assert that all outgoing requests have authentication after the state has been recovered.

Copy link
Contributor

@tvernum tvernum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jaymode jaymode merged commit dc57eec into elastic:master Jun 19, 2018
@jaymode jaymode deleted the fix_join_existing branch June 19, 2018 18:00
jaymode added a commit that referenced this pull request Jun 19, 2018
The changes made to disable security for trial licenses unless security
is explicitly enabled caused issues when a 6.3 node attempts to join a
cluster that already has a production license installed. The new node
starts off with a trial license and `xpack.security.enabled` is not
set for the node, which causes the security code to skip attaching the
user to the request. The existing cluster has security enabled and the
lack of a user attached to the requests causes the request to be
rejected.

This commit changes the security code to check if the state has been
recovered yet when making the decision on whether or not to attach a
user. If the state has not yet been recovered, the code will attach
the user to the request in case security is enabled on the cluster
being joined.

Closes #31332
jaymode added a commit that referenced this pull request Jun 19, 2018
The changes made to disable security for trial licenses unless security
is explicitly enabled caused issues when a 6.3 node attempts to join a
cluster that already has a production license installed. The new node
starts off with a trial license and `xpack.security.enabled` is not
set for the node, which causes the security code to skip attaching the
user to the request. The existing cluster has security enabled and the
lack of a user attached to the requests causes the request to be
rejected.

This commit changes the security code to check if the state has been
recovered yet when making the decision on whether or not to attach a
user. If the state has not yet been recovered, the code will attach
the user to the request in case security is enabled on the cluster
being joined.

Closes #31332
dnhatn added a commit that referenced this pull request Jun 20, 2018
* 6.x:
  [DOCS] Omit shard failures assertion for incompatible responses  (#31430)
  [DOCS] Move licensing APIs to docs (#31445)
  backport of: add is-write-index flag to aliases (#30942) (#31412)
  backport of: Add rollover-creation-date setting to rolled over index (#31144) (#31413)
  [Docs] Extend Homebrew installation instructions (#28902)
  [Docs] Mention ip_range datatypes on ip type page (#31416)
  Multiplexing token filter (#31208)
  Fix use of time zone in date_histogram rewrite (#31407)
  Revert "Mute DefaultShardsIT#testDefaultShards test"
  [DOCS] Fixes code snippet testing for machine learning (#31189)
  Security: fix joining cluster with production license (#31341)
  [DOCS] Updated version in Info API example
  [DOCS] Moves the info API to docs (#31121)
  Revert "Increasing skip version for failing test on 6.x"
  Preserve response headers on cluster update task (#31421)
  [DOCS] Add code snippet testing for more ML APIs (#31404)
  Docs: Advice for reindexing many indices (#31279)
dnhatn added a commit that referenced this pull request Jun 20, 2018
* master:
  [DOCS] Omit shard failures assertion for incompatible responses  (#31430)
  [DOCS] Move licensing APIs to docs (#31445)
  Add Delete Snapshot High Level REST API
  Remove QueryCachingPolicy#ALWAYS_CACHE (#31451)
  [Docs] Extend Homebrew installation instructions (#28902)
  Choose JVM options ergonomically
  [Docs] Mention ip_range datatypes on ip type page (#31416)
  Multiplexing token filter (#31208)
  Fix use of time zone in date_histogram rewrite (#31407)
  Core: Remove index name resolver from base TransportAction (#31002)
  [DOCS] Fixes code snippet testing for machine learning (#31189)
  [DOCS] Removed  and  params from MLT. Closes #28128 (#31370)
  Security: fix joining cluster with production license (#31341)
  Unify http channels and exception handling (#31379)
  [DOCS] Moves the info API to docs (#31121)
  Preserve response headers on cluster update task (#31421)
  [DOCS] Add code snippet testing for more ML APIs (#31404)
  Do not preallocate bytes for channel buffer (#31400)
  Docs: Advice for reindexing many indices (#31279)
  Mute HttpExporterTests#testHttpExporterShutdown test Tracked by #31433
  Docs: Add note about removing prepareExecute from the java client (#31401)
  Make release notes ignore the `>test-failure` label. (#31309)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker >bug :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc) v6.3.1 v6.4.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants